Authors: Mauro Venticinque, Angelo Schillaci, Daniele Tambone
GitHub project: Bank-Marketing
Date: 2025-03-24
Here we will write some information about the project.
datatable(head(train, 100), options = list(scrollX = TRUE))
str(train)
## 'data.frame': 32950 obs. of 22 variables:
## $ X : int 35248 39854 14530 27822 40199 21227 16836 39099 38565 38152 ...
## $ age : int 30 39 43 27 56 41 57 46 61 35 ...
## $ job : chr "blue-collar" "technician" "services" "student" ...
## $ marital : chr "married" "married" "single" "single" ...
## $ education : chr "professional.course" "university.degree" "high.school" "high.school" ...
## $ default : chr "no" "no" "no" "no" ...
## $ housing : chr "no" "yes" "no" "yes" ...
## $ loan : chr "no" "no" "no" "no" ...
## $ contact : chr "cellular" "cellular" "cellular" "cellular" ...
## $ month : chr "may" "jun" "jul" "mar" ...
## $ day_of_week : chr "fri" "mon" "tue" "thu" ...
## $ duration : int 1357 713 1317 80 230 697 1441 679 106 234 ...
## $ campaign : int 4 2 4 4 2 2 2 1 2 1 ...
## $ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
## $ previous : int 1 0 0 0 1 0 0 0 1 0 ...
## $ poutcome : chr "failure" "nonexistent" "nonexistent" "nonexistent" ...
## $ emp.var.rate : num -1.8 -1.7 1.4 -1.8 -1.7 1.4 1.4 -3 -3.4 -3.4 ...
## $ cons.price.idx: num 92.9 94.1 93.9 92.8 94.2 ...
## $ cons.conf.idx : num -46.2 -39.8 -42.7 -50 -40.3 -36.1 -42.7 -33 -26.9 -29.8 ...
## $ euribor3m : num 1.25 0.72 4.96 1.65 0.87 ...
## $ nr.employed : num 5099 4992 5228 5099 4992 ...
## $ subscribed : chr "yes" "yes" "yes" "yes" ...
attach(train)
X (Integer): average yearly balanceage (Integer): age of the customerjob (Categorical): occupationmarital (Categorical): marital statuseducation (Categorical): education leveldefault (Binary): has credit in default?housing (Binary): has housing loan?loan (Binary): has personal loan?contact (Categorical): contact communication typemonth (Categorical): last contact month of yearday_of_week (Integer): last contact day of the
weekduration (Integer): last contact duration, in seconds
(numeric). Important note: this attribute highly affects the output
target (e.g., if duration=0 then y=‘no’). Yet, the duration is not known
before a call is performed. Also, after the end of the call y is
obviously known. Thus, this input should only be included for benchmark
purposes and should be discarded if the intention is to have a realistic
predictive modelcampaign (Integer): number of contacts performed during
this campaign and for this client (numeric, includes last contact)pdays (Integer): number of days that passed by after
the client was last contacted from a previous campaign (numeric; -1
means client was not previously contacted)previous (Integer): number of contacts performed before
this campaign and for this clientpoutcome (Categorical): outcome of the previous
marketing campaign (categorical: ‘failure’,‘nonexistent’,‘success’)subscribed (Binary): has the client subscribed a term
deposit?Source: UCI Machine Learning Repository
vis_dat(train)
corrplot(cor(train[, c("X", "age", "duration", "campaign", "pdays", "previous", "emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed")]), method="pie")
plot_ly(train, x = job, y = age, type = 'box', color = job)
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
plot_ly(train, x = subscribed, type = 'histogram')
1.1.3 Social and economic context attributes
emp.var.rate(Integer): employment variation rate - quarterly indicatorcons.price.idx(Integer): consumer price index - monthly indicatorcons.conf.idx(Integer): consumer confidence index - monthly indicatoreuribor3m(Integer): euribor 3 month rate - daily indicatornr.employed(Integer): number of employees - quarterly indicator